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Abstract 

Statically typed programming languages allow earlier error checking, better en- 
forcement of disciplined programming styles, and generation of more efficient object 
code than languages where all type consistency checks are performed at run time. 
However, even in statically typed languages, there is often the need to deal with data 
whose type cannot be determined at compile time. To handle such situations safely, 
we propose to add a type Dynamic whose values are pairs of a value v and a type tag 
T where v has the type denoted by T. Instances of Dynamic are built with an explicit 
tagging construct and inspected with a type safe typecase construct. 

This paper explores the syntax, operational semantics, and denotational semantics 
of a simple language including the type Dynamic. We give examples of how dynamically 
typed values can be used in programming. Then we discuss an operational semantics 
for our language and obtain a soundness theorem. We present two formulations of the 
denotational semantics of this language and relate them to the operational semantics. 
Finally, we consider the implications of polymorphism and some implementation issues. 

1 Introduction 

Statically typed programming languages allow earlier error checking, better enforcement of 
disciplined programming styles, and generation of more efficient object code than languages 
where all type consistency checks are performed at run time. However, even in statically 
typed languages, there is often the need to deal with data whose type cannot be determined 
at compile time. For example, full static typechecking of programs that exchange data with 
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other programs or access persistent data is in general not possible. A certain amount of 
dynamic checking must be performed in order to preserve type safety. 

Consider a program that reads a bitmap from a file and displays it on a screen. Probably 
the simplest way to do this is to store the bitmap externally as an exact binary image of its 
representation in memory. (For concreteness, assume that the bitmap is stored internally 
as a pair of integers followed by a rectangular array of booleans.) But if we take strong 
typing seriously, this is unacceptable: when the data in the file happens not to be two 
integers followed by a bit string of the appropriate length, the result can be chaos. The 
safety provided by static typing has been compromised. 

A better solution, also widely used, is to build explicit procedures for reading and 
writing bitmaps — storing them externally as character strings, say, and generating an 
exception if the contents of the file are not a legal representation of a bitmap. This 
amounts essentially to decreeing that there is exactly one data type external to programs 
and to requiring that all other types be encoded as instances of this single type. Strong 
typing can now be preserved — at the cost of some programming. But as software systems 
grow to include thousands of data types, each of which must be supplied with printing and 
reading routines, this approach becomes less and less attractive. What is really needed is 
a combination of the convenience of the first solution with the safety of the second. 

The key to such a solution is the observation that, as far as safety is concerned, the 
important feature of the second solution is not the details of the encoding of a bitmap as 
a string, but the fact that it is possible to generate an exception if a given string does not 
represent a bitmap. This amounts to a run-time check of the type correctness of the read 
operation. 

With this insight in hand, we can combine the two solutions above: the contents of a 
file should include both a binary representation of a data object and a representation of its 
type. The language can provide a single read operation that checks whether the type in 
the file matches the type declared for the receiving variable. In fact, rather than thinking 
of files as containing two pieces of information — a data object and its type — we can think 
of them as containing a pair of an object and its type. We introduce a new data type called 
Dynamic whose values are such pairs, and return to the view that all communication with 
the external world is in terms of objects of a single type — no longer String, but Dynamic. 
The read routine itself does no run-time checks, but simply returns a Dynamic. We provide 
a language construct, dynamic (with a lowercase "d"), for packaging a value together with 
its type into a Dynamic (which can then be "externed" to a file), and a typecase construct 
for inspecting the type tag of a given Dynamic. 

We might use typecase, for example, to display the entire contents of a directory where 
each file may be either a bitmap or a string: 

foreach filename in openDir("MyDir") do 
let image = read(f ilename) in 
typecase image of 
(b : Bitmap) 

displayBitmap (b) 
(s : String) 

displayString(s) 
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else 

displayString( 11 <???>") 

end 

This example can be generalized by making the directory itself into a Dynamic. Indeed, the 
entire file system could be based on Dynamic structures. Dynamic objects can also be used 
as the values exchanged during interprocess communication, thereby providing type safe 
interactions between processes. The Remote Procedure Call paradigm [4] uses essentially 
this mechanism. (Most RPC implementations optimize the conversions to and from the 
transport medium, so the Dynamic objects may exist only in principle.) 

A number of systems already incorporate mechanisms similar to those we have de- 
scribed. But so far these features have appeared in the context of full-scale language 
designs, and seldom with a precise formal description of their meaning. No attention has 
been given to the more formal implications of dynamic typing, such as the problems of 
proving soundness and constructing models for languages with Dynamic. 

The purpose of this paper is to study the type Dynamic in isolation, from several angles. 
Section 2 reviews the history of dynamic typing in statically typed languages and describes 
some work related to ours. Section 3 introduces our version of the dynamic and typecase 
constructs and gives examples of programs that can be written with them. Section 4 
presents an operational semantics for our language and obtains a syntactic soundness the- 
orem. Section 5 investigates two semantic models for the same language and their relation 
to the operational semantics. Section 6 outlines some preliminary work on extending our 
theory to a polymorphic lamb da- calculus with Dynamic. Section 7 discusses some of the 
issues involved in implementing Dynamic efficiently. 

2 History and Related Work 

Since at least the mid-1960s, a number of languages have included finite disjoint unions 
(e.g. Algol- 68) or tagged variant records (e.g. Pascal). Both of these can be thought of 
as "finite versions" of Dynamic: they allow values of different types to be manipulated 
uniformly as elements of a tagged variant type, with the restriction that the set of variants 
must be fixed in advance. Simula-67's subclass structure [5], on the other hand, can be 
thought of as an infinite disjoint union — essentially equivalent to Dynamic. The Simula- 
67 INSPECT statement allows a program to determine at run time which subclass a value 
belongs to, with an ELSE clause for subclasses that the program doesn't know or care 
about. 

CLU [20] is a later language that incorporates the idea of dynamic typing in a static 
context. It has a type any and a force construct that attempts to coerce an any into an 
instance of a given type, raising an exception if the coercion is not possible. Cedar/Mesa 
[19] provides very similar REFANY and TYPECASE. These features of Cedar/Mesa were 
carried over directly into Modula-2+ [33] and Modula-3 [8, 9]. In CLU and Cedar/Mesa, 
the primary motivation for including a dynamic type was to support programming idioms 
from LISP. 

Shaffert and Scheifler gave a formal definition [34] and denotational semantics [35] of 
CLU, including the type any and the force construct. This semantics relies on a domain of 
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run time values where every value is tagged with its compile time type. Thus, the coercion 
mapping a value of a known type into a value of type any is an identity function; force 
can always look at a value and read off its type. Our approach is more refined, since it 
distinguishes those values whose types may need to be examined at run time from those 
that can be stripped during compilation. Moreover, the semantic definition of CLU has 
apparently never been proved to be sound. In particular, it is not claimed that run time 
values actually occurring in the evaluation of a well-typed program are tagged with the 
types that they actually possess. The proof of a soundness result for CLU would probably 
require techniques similar to those developed in this paper. 

ML [16, 25, 26] and its relatives have shown more resistance to the incorporation of 
dynamic typing than languages in the Algol family. Probably this is because many of 
the uses of Dynamic in Algol-like languages are captured in ML by polymorphic types. 
Moreover, until recently ML has not been used for building software systems that deal 
much with persistent data. Still, there have been various proposals for extending ML with 
a dynamic type. Gordon seems to have thought of it first [15]; his ideas were later extended 
by Mycroft [28]. The innovation of allowing pattern variables in typecase expressions (see 
below) seems to originate with Mycroft. (Unfortunately, neither of these proposals were 
published.) Recent versions of the CAML language [38] include features quite similar to 
our dynamic and typecase constructs. 

Amber [7], a language based on subtyping, includes a Dynamic type whose main use is 
for handling persistent data. In fact, the Amber system itself depends heavily on dynami- 
cally typed values. For example, when a module is compiled, it is stored in the file system 
as a single Dynamic object. Uniform use of Dynamic in such situations greatly simplifies 
Amber's implementation. 

The use of dynamically typed values for dealing with persistent data seems to be 
gaining in importance. Besides Amber, the mechanism is used pervasively in the Modula- 
2+ programming environment. A REFANY structure can be "pickled" into a bytestring or 
a file, "unpickled" later by another program, and inspected with TYPECASE. Dynamically 
typed objects have also been discussed recently in the database literature as an approach 
to dealing with persistent data in the context of statically typed database programming 
languages [1, 2, 10]. 

Recently, Thatte [36] has described a "quasi-static" type system based on the one de- 
scribed here, where our explicit dynamic and typecase constructs are replaced by implicit 
coercions and run time checks. 

3 Programming with Dynamic 

This section introduces the notation used in the rest of the paper — essentially Church's 
simply typed lamb da- calculus [12, 17] with a call-by- value reduction scheme [30], extended 
with the type Dynamic and the dynamic and typecase constructs. We present a number 
of example programs to establish the notation and illustrate its expressiveness. 

Our fundamental constructs are A-abstraction, application, conditionals, and arith- 
metic on natural numbers. We write e v to show that an expression e evaluates to a 
value v, and e:T to show that an expression e has type T. For example, 
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+ : Nat->Nat->Nat 
5+3 8 

(Af :Nat->Nat.f (0)) : (Nat->Nat)->Nat 

(Af :Nat->Nat.f (0)) (Ax:Nat.x+l) 1 

In order to be able to consider evaluation and typechecking separately, we define the 
behavior of our evaluator over all terms — not just over well-typed terms. (In a compiler for 
this language, the typechecking phase might strip away type annotations before passing 
programs to an interpreter or code generator. Our simple evaluator just ignores the type 
annotations.) 

Of course, evaluation of arbitrary terms may encounter run-time type errors such as 
trying to apply a number as if it were a function. The result of such computations is the 
distinguished value wrong: 

(5 6) =>■ wrong 

(Azrlat.O) (5 6) =>• wrong 

Note that in the second example a run-time error occurs even though the argument z 
is never used in (Az.O): we evaluate expressions in applicative order. Also, note that 
wrong is different from _L (nontermination). This allows us to distinguish in the semantics 
between programs that loop forever, which may be perfectly well typed, and programs that 
crash because of run-time type errors. 

To make the examples in this section more interesting we also use strings, booleans, 
cartesian products, and recursive A-expressions, all of which are omitted in the formal 
parts of the paper. Strings are written in double quotes; || is the concatenation operator 
on strings. Binary cartesian products are written with angle brackets; f st and snd are pro- 
jection functions returning the first and second components of a pair. Recursive lambda ex- 
pressions are written using the fixpoint operator rec, where we intend rec (f : U^T) Ax : U . e 
to denote the least-defined function f such that, informally, f = Ax:U.e. For example, 

<Ax:Iat.x+l,l> : (Nat->Nat) xNat 

snd(<Ax:Iat.x+l,l>) =>• 1 

(rec(f :Nat->Nat) An: Mat . 

if n=0 then 1 else n*f(n-l)) (5) 
120 

We show at the end of this section that recursive A-expressions actually need not be 
primitives of the language: they can be defined using Dynamic. 

Values of type Dynamic are built with the dynamic construct. The result of evaluating 
the expression dynamic e:T is a pair of a value v and a type tag T, where v is the result 
of evaluating e. The expression dynamic e:T has type Dynamic if e has type T. 

The typecase construct is used to examine the type tag of a Dynamic value. For 
example, the expression 
Ax: Dynamic. 

typecase x of 
(i:Nat) i+1 
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else 0 

end 

applied to (dynamic l:Nat), evaluates to 2. The evaluator attempts to match the type 
tag of x against the pattern Nat, succeeds, binds i to the value component of x, adds 1 to 
i, and returns the result. 

The patterns in the case branches need not fully specify the type they are to match: 
they may include "pattern variables," which match any subexpression in the type tag of 
the selector. The pattern variables are listed between parentheses at the beginning of each 
guard, indicating that they are bound in the branch. 

The full syntax of typecase is 
typecase e sel of 

(X*) (Xi:Ti) 9; 

else e else 

end 

where e sel , e else , and e ; are expressions, x ; are variables, T ; are type expressions, and X ; are 
lists of distinct type variables. (It will sometimes be convenient to treat the X ; as a set 
rather than a list.) If any of the X ; are empty, their enclosing parentheses may be omitted. 
The occurrences of type variables in X ; are binding and have scope over the whole branch, 
that is, over both T ; and e ; . 

If the type tag of a typecase selector matches more than one guard, the first matching 
branch is executed. There are other possible choices here. For instance, we could imagine 
requiring that the patterns form an "exclusive and exhaustive" covering of the space of 
type expressions so that a given type tag always matches exactly one pattern [28]. 

One example using dynamic values is a function that returns a printable string repre- 
sentation of any dynamic value: 

rec(tostring: Dynamic^String) 
Adv: Dynamic, 
typecase dv of 

(v: String) "\"" || v || "\"" 
(v: Mat) natToStr(v) 
(X,Y) (v: X^Y) M <function>" 
(X,Y) (v: XxY) 

"<" || tostring (dynamic fst(v):X) || "," 
|| tostring (dynamic snd(v) :Y) || ">" 
(v: Dynamic) 

"dynamic " || tostring(v) 
else "<unknown>" 
end 

The case for pairs illustrates a subtle point. It uses a pattern to match any pair, and 
then calls the tostring function recursively to convert the components. To do this, it 
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must package them into new dynamic values by tagging them with their types. This is 
possible because the variables X and Y are bound at run time to the appropriate types. 

Since the type tag is part of the run-time representation of a dynamic value, the case for 
Dynamic should probably return a string representation not only of the tagged value but 
of the type tag itself. This is straightforward, using an auxiliary function typetostring 
with the same structure as tostring. 

rec (typetostring: Dynamic^String) 
Adv: Dynamic, 
typecase dv of 

(v: String) "String" 
(v: Mat) "Mat" 

(X,Y) (v: X^Y) "<function>" 
(X,Y) (v: XxY) 

typetostring (dynamic fst(v):X) 

|| "X" 

|| typetostring (dynamic snd(v) :Y) 
(v: Dynamic) "Dynamic" 
else "<unknown>" 
end 

Neither tostring nor typetostring quite do their jobs: for example, when tostring 
gets to a function, it stops without giving any more information about the function. It can 
do no better, given the mechanisms we have described, since there is no effective way to 
get from a function value to an element of its domain or codomain. This limitation even 
precludes using typetostring to show the domain and codomain types of the function, 
since the argument to typetostring must be a value, not just a disembodied type. 

It would be possible to add another mechanism to the language, providing a way 
of "unpackaging" the type tag of a Dynamic into a data structure that could then be 
examined by the program. (Amber [7] and Cedar/Mesa [19] have this feature.) Although 
this would be a convenient way to implement operations like type printing — which may 
be important in practice — we believe that most of the theoretical interest of Dynamic lies 
in the interaction between statically and dynamically checked parts of the language that 
the typecase expression allows. Under the proposed extension, a function could behave 
differently depending on the type tag of a dynamic value passed as a parameter, but the 
type of its result could not be affected without giving up static typechecking. 

Another example, demonstrating the use of nested typecase expressions, is a function 
that applies its first argument to its second argument, after checking that the application 
is correctly typed. Both arguments are passed as dynamic values, and the result is a new 
dynamic value. When the application fails, the type tag of the result will be String and 
its value part will be "Error". (In a richer language we could raise an exception in this 
case.) 

Adf: Dynamic. Ade: Dynamic, 
typecase df of 

(X,Y) (f : X -+ Y) 
typecase de of 
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(e: X) dynamic f(e):Y 
else dynamic "Error" : String 
end 

else dynamic "Error" : String 
end 

Note that in the first guard of the inner typecase, X is not listed as a bound pattern 
variable. It is not intended to match any type whatsoever, but only the domain type of f . 
Therefore, it retains its binding from the outer pattern, making it a constant as far as the 
inner typecase is concerned. 

Readers may enjoy the exercise of defining a similar function that takes two functions 
as dynamic values and returns their composition as a dynamic value. 

In contrast to some languages with features similar to Dynamic (for example, Modula- 
2+ [33]), the set of type tags involved in a computation cannot be computed statically: 
our dynamic expressions can cause the creation of new tags at run time. A simple example 
of this is a function that takes a dynamic value and returns a Dynamic whose value part 
is a pair, both of whose components are equal to the value part of the original dynamic 
value: 

Adx: Dynamic. 

typecase dx of 
(X) (x: X) 

dynamic <x,x>: X X X 
else dx 

end 

It is easy to see that the type tag on the dynamic value returned by this function 
must be constructed at run time, rather than simply being chosen from a finite set of tags 
generated by the compiler. 

Our last application of Dynamic is more substantial. We show that it can be used 
to build a fixpoint operator, allowing recursive computations to be expressed in the lan- 
guage even without the rec construct. It is well known that fixpoint operators cannot 
be expressed in the ordinary simply typed lamb da- calculus. (This follows from the strong 
normalization property [17, p. 163].) However, by hiding a certain parameter inside a 
dynamic value, smuggling it past the type system, and unpackaging it again where it is 
needed, we can write a well-typed version in our language. 

A fixpoint of a function f is an argument x for which f (x) = x (our use of the equality 
sign here is informal). A fixpoint operator fix is a function that returns a fixpoint of a 
function f when applied to f : 

f ix f = f (fix f ). 

In call-by- value lamb da- calculi, an extensional version of this property must be used in- 
stead: for any argument a, 



(fix f ) a = f (f ix f ) a 
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One function with this property (a call-by- value version of the standard Y combinator 
[3, p. 131], [30]) can be expressed in an untyped variant of our notation by: 

f ix = Af . d d 

where 

d=Ax. Az. (f (x x)) z. 

To see that (f ix f ) a = f (f ix f ) a for any function f and argument a, we calculate 
as follows. 

(fix f) a = ((Af . d d) f) a 
= (d d) a 

= (Az. (f (d d)) z) a 
= (f (d d)) a 
= (f ((Af . d d) f)) a 
= (f (fix f)) a 

To build something similar in the typed language, we need to do a bit more work. 
Rather than a single fixpoint function, we have to build a family of functions (one for each 
arrow type). That is, for each arrow type T^U we define a function f ix T ^ n whose type 
is ( (T— >U) — ► (T— >U) ) — ► (T— >U) . Unfortunately, there is no way to obtain f ix T ^ n by just 
filling in suitable type declarations in the untyped fix given above. We need to build it 
in a more roundabout way. 

First, we need an expression a T for each type T. (It does not matter what the expressions 
are; we need to know only that there is one for every type.) Define: 
a Nat = 0 

^■String 

a TxU = ^ a T) a U^ 
cL'P y \j ~ Ax l X * cLu 

a Dy na mi c = dynamic 0:Nat 

Next, we build a family of "embedding" functions from each type T into Dynamic, and 
corresponding "projection" functions from Dynamic to T: 
emb T = Ax:T. dynamic x:T 
proj T = Ay: Dynamic. 

typecase y of 
(z:T) z 
else a T 

end 

It is easy to see that if an expression e of type T evaluates to some value v, then so does 
proj T (emb T (e) ) . 

Now we are ready to construct f ix T ^ n . Abbreviate: 
emb = emb Dynamic ^ (T ^ u) 
proj = projny^i^T^u) 

d = Ax: Dynamic. Az:T. f ((proj x) x) z 
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To see that d is well-typed, assume that f has type (T^U)^(T^U). The type of d works 
out to be Dynamics (T— >U) . Then 

fix T _ n = Af:((T-> U)-> (T-> U)). d (emb d) 
has type ( (T— >U) — ► (T— >U) ) — ► (T— >U) , as required, and has the correct behavior. 

4 Operational Semantics 

We now formally define the syntax of the simply typed lamb da- calculus with Dynamic and 
give operational rules for typechecking and evaluation. 

4.1 Notation 

TVar is a countable set of type variable identifiers. TExp is the class of type expressions 
defined over these by the following BNF equation, where T and U range over TExp and X 
ranges over TVar. 

T ::= Mat 

I x 

| T U 
| Dynamic 

Similarly, Var is a countable set of variables and OpenExp is the class of open expressions 
defined by the following equation, where e ranges over OpenExp, x over Var, and T over 
TExp: 

e ::= x 

| wrong 

| Ax:T.e body 

| e fun( e body) 

I 0 

I succ e nat 

| test e nat 0:e zero succ(x) : e succ 

| dynamic e body :T 

| typecase e sel of 

(X*) (Xi:Ti) 9; 

else e else 

end 

Recall that X ; denotes a list of distinct type variables, and that if the list is empty the 
enclosing parentheses may be omitted. 
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This is a simpler language than we used in the examples. We have omitted strings, 
booleans, cartesian products, and built-in recursive A-expressions. The natural num- 
bers, our only built-in datatype, are presented by 0, succ, and test. The test 
construct helps reduce the low-level clutter in our definitions by subsuming the usual 
if . . .then. . . else . . . construct, test for zero, predecessor function, and boolean datatype 
into a single construct. It is based on Martin-Lof's elimination rule for natural numbers 
[23]. 

We give special names to certain subsets of TExp and OpenExp. FTV(e) is the set of 
free type variables in e. FV(e) is the set of free variables in e. ClosedExp denotes the 
closed expressions; Exp denotes the expressions with no free type variables (but possibly 
with free variables); TypeCode denotes the closed type expressions. When we write just 
"expression," we mean an expression with no free type variables. 

Evaluation is taken to be a relation between expressions and expressions (rather 
than between expressions and some other domain of values). We distinguish a set 
Value C ClosedExp of expressions "in canonical form." The elements of Value are de- 
fined inductively: wrong is in canonical form; 0, succ 0, succ (succ 0), ... are in canon- 
ical form; an expression (Ax:T.e body ) is in canonical form if it is closed; an expression 
dynamic e body :T is in canonical form if T is closed and e body is in canonical form and 
different from wrong. 

A substitution a is a finite function from type variables to closed type expressions, 
written [X <— T, Y <— U, . . .]. Subst denotes the set of all substitutions. Substg denotes 
the set of substitutions whose domain is X ; . We use a similar notation for substitution of 
canonical expressions for free variables in expressions. 

A type environment is a finite function from variables to closed type expressions. To de- 
note the modification of a type environment TE by a binding of x to T, we write TE[x <— T]. 
The empty type environment is denoted by 0. 

We consistently use certain variables to range over particular classes of objects. The 
metavariables x, y, and z range over variables in the language. (They are also sometimes 
used as actual variables in program examples.) The metavariable e ranges over expres- 
sions. Similarly, X, Y, and Z range over type variables and T, U, V, and W range over type 
expressions. The letter a ranges over substitutions. TE ranges over type environments. 
Finally, v and w range over canonical expressions. 

These definitions and conventions are summarized in Figures 1 and 2. 
4.2 Typechecking 

Our notation for describing typechecking and evaluation is a form of "structural operational 
semantics" [31]. The typing and evaluation functions are specified as systems of inference 
rules; showing that an expression has a given type or reduces to a given value amounts 
precisely to giving a proof of this fact using the rules. Because the inference rules are 
similar to those used in systems for natural deduction in logic, this style of description has 
also come to be known as "natural semantics" [18]. 
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Nat 


numbers 


Var 


variables 


TVar 


type variables 


TExp 


type expressions 


TypeCode = {T G TExpl FTV(f) = 0} 


closed types 


OpenExp 


open expressions 


ClosedExp = {e G OpenExp] FV(e) = FTV(e) 


= 0} closed expressions 


Exp = {e G OpenExp] b 1 V(&) = v\ 


expressions 


Value = {e G OpenExp\ e in canonical form} 


canonical expressions 


Subst = TVar — TypeCode 


substitutions 


Subst^. = TVar — TypeCode 


substitutions with domain Xj 


TEnv = Var — TypeCode 


type environments 



Figure 1: Summary of Basic Definitions 
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type variables 
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V, w 


type expressions 


a 




substitutions 


TE 




type environments 



Figure 2: Summary of Naming Conventions 



The rules closely follow the structure of expressions, and incorporate a strong notion of 
computation. To compute a type for e fun (e arg ), for example, we first attempt to compute 
types for its subterms e fun and e arg and then, if we are successful, to combine the results. 
This exactly mimics the sequence of events we might observe inside a typechecker for the 
language. 

The formalism extends fairly easily to describing a variety of programming language 
features like assignment statements and exceptions. This breadth of coverage and "oper- 
ational style" makes the notation a good one for specifying comparatively rich languages 
like Standard ML [26]. A group at INRIA has built a system for directly interpreting 
formal specifications written in a similar notation [6, 13, 14]. 

The rules below define the situations in which the judgement "expression e has type 
T" is valid under assumptions TE. This is written " TE h e : T" . 

The first rule says that a variable identifier has whatever type is given for it in the type 
environment. If it is unbound in the present type environment, then the rules simply fail 
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to derive any type. (Technically, the clause "x G Dom( TE) v is not a premise but a side 
condition that determines when the rule is applicable.) 

x G Dom(TE) 



TE*rx: TE(x) 



A A-expression must have an arrow type. The argument type is given explicitly by 
the annotation on the bound variable. To compute the result type, we assume that the 
bound variable has the declared type, and attempt to derive a type for the body under 
this assumption. 



TE[x <- U] h e 



body 



TE h Ax:U.e body : (U^T) 

A well-typed function application must consist of an expression of some arrow type 
applied to another expression, whose type is the same as the argument type of the first 
expression. 

TE h e fun : (U^T) 
TE h e arg : U 



TE h e fun (e arg ) : T 

The constant 0 has type Nat. 
TE h 0 : Mat 



A succ expression has type Nat if its body does. 

TE h e nat : Nat 
TE h succ e nat : Nat 

A test expression has type T if its selector has type Nat and both of its arms have 
type T. The type of the second arm is derived in an environment where the variable x has 
type Nat. 

TE h e nat : Nat 
TE h e zero : T 

TE[x +- Nat] h e succ : T 

TE h (test e nat 0:e zero succ(x) :e succ ) :T 

A dynamic expression is well-typed if the body actually has the type claimed for it. 
TE h e body : T 



TE h (dynamic e body :T) : Dynamic 
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The typecase construct is a bit more complicated. In order for an expression of the 
form (typecase e sel of ... (X ; ) (xjrTi) e ; ... end) to have a type T, three con- 
ditions must be met: First, the selector e sel must have type Dynamic. Second, for every 
possible substitution a of typecodes for the pattern variables X ; , the body e ; of each branch 
must have type T. Third, the else arm must also have type T. 

The second premise is quantified over all substitutions a £ Subst^. Strictly speaking, 
there are no inference rules that allow us to draw conclusions quantified over an infinite set, 
so a proof of this premise requires an infinite number of separate derivations. Such infinitary 
derivations present no theoretical difficulties — in fact, they make the rule system easier to 
reason about — but a typechecker based naively on these rules would have poor performance. 
However, our rules can be replaced by a Unitary system using skolem constants that derives 
exactly the same typing judgements. 

TE h e sel : Dynamic 
Vi, V<7 G Subst^ . TE[xi <— T ; (t] h e ; <7 : T 

TE h e else : T 

TE h (typecase e sel of 

...(X*) (Xi:Ti) ei ... 
else e else 
end) : T 

Finally, note that the expression wrong is assigned no type. It is the only syntactic 
form in the language with no associated typing rule. 

4.3 Evaluation 

The evaluation rules are given in the same notation as the typechecking rules. We define 
the judgement "closed expression e reduces to canonical expression v," written "e =>■ v," 
by giving rules for each syntactic construct in the language. In general, there is one rule 
for the normal case, plus one or two others specifying that the expression reduces to wrong 
under certain conditions. 

In this style of semantic description, there is no explicit representation of a nonter- 
minating computation. Whereas in standard denotational semantics an expression that 
loops forever has the value _L (bottom), our evaluation rules simply fail to derive any 
result whatsoever. 

When the evaluation of an expression encounters a run-time error like trying to apply 
a number as if it were a function, the value wrong is derived as the expression's value. The 
evaluation rules preserve wrong. 

There is no rule for evaluating a variable: evaluation is defined only over closed expres- 
sions. Parameter substitution is performed immediately during function application. 

The constant wrong is in canonical form, 
h wrong =>■ wrong 



Every A-expression is in canonical form. 
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h Ax:T.e body =>■ Ax:T.e body 

We have chosen a call-by- value (applicative-order) evaluation strategy: to evaluate a 
function application, the expression being applied must be reduced to a canonical expres- 
sion beginning with A and the argument expression must be reduced to some legal value, 
that is, its computation must terminate and should not produce wrong. If one of these 
computations results in wrong, the application itself reduces immediately to wrong. Oth- 
erwise the argument is substituted for the parameter variable in the A body, which is then 
evaluated under this binding. 

I~ e fun =^ ^ X : T • e body 

h e arg =>- w (w/ wrong) 
h e body [x <- w] v 
I" e fun (e arg ) v 

l~ e fun =>■ w ( w n °t °f the form (Ax:T.e body )) 
I" e fun (e arg ) wrong 

h e fun =>• w (w = (Ax:T.e body )) 

h e arg wrong 

I" e fun (e arg ) =>• wrong 

The constant 0 is in canonical form. 
hO^O 

A succ expression is in canonical form when its body is a canonical number (that is, 
an expression of the form 0 or succ n, where n is a canonical number). It is evaluated by 
attempting to evaluate the body to a canonical value v, returning wrong if the result is 
anything but a number and otherwise returning succ applied to v. 

I~ e nat =>■ v ( v a canonical number) 
h succ e nat =>■ succ v 

l~ e nat =>■ v ( v not a canonical number) 
h succ e nat =>- wrong 

A test expression is evaluated by evaluating its selector, returning wrong if the result is 
not a number, and otherwise evaluating one or the other of the arms depending on whether 
the selector is zero or a positive number. In the latter case, the variable x is bound inside 
the arm to the predecessor of the selector. 
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I" enat 0 

I- e zero v 

h (test e nat 0:e zero succ(x) :e succ ) =>- v 

h e nat =>- succ w 

I- e succ [x <- w] =>- v 

h (test e nat 0:e zero succ(x) :e succ ) =>- v 

l~ e nat =>■ w ( w n °t a canonical number) 
h (test e nat 0:e zero succ(x) :e succ ) =>- wrong 

A dynamic expression is evaluated by evaluating its body. If the body reduces to wrong 
then so does the whole dynamic expression. 

I- e body w («/ wrong) 

h (dynamic e body :T) =>■ dynamic w:T 

I- e body wrong 

h (dynamic e body :T) =>• wrong 

A typecase expression is evaluated by evaluating its selector, returning wrong imme- 
diately if this produces wrong or anything else that is not a dynamic value, and otherwise 
trying to match the type tag of the selector value against the guards of the typecase. The 
function match has the job of matching a run-time typecode T against a pattern expres- 
sion U with free variables. If there is a substitution a such that T=U<7, then match(T, 
U)=<7. (For the simple type expressions we are dealing with here, a is unique if it exists.) 
Otherwise, match(T, U) fails. Section 7.2 discusses the implementation of match. 

The branches are tried in turn until one is found for which match succeeds. The 
substitution returned by match is applied to the body of the branch. Then the selector's 
value component is substituted for the parameter variable in the body, and the resulting 
expression is evaluated. (As in the rule for application, we avoid introducing run-time 
environments by immediately substituting the bound variable x ; and pattern variables T ; 
into the body of the matching branch.) The result of evaluating the body becomes the 
value for the whole typecase. 

If no guard matches the selector tag, the else body is evaluated instead. 

h e sel =>■ dynamic w:T 
\/j<k. match(T, Tj) fails 
match(T, T k ) = a 
h e k <r[x k <— w] =>■ v 
h (typecase e sel of 

...(X*) (Xi:Ti) ei ... 
else e else ) 
end) =>■ v 
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h e sel =>■ dynamic w:T 
match(T, T k ) fails 

I- e e i se v 

h (typecase e sel of 

...(X*) (Xi:Ti) ei ... 
else e else ) 
end) =>■ v 

he^v (v not of the form (dynamic w:T)) 
h (typecase e sel of 

...(X*) (Xi:Ti) ei ... 
else e else ) 
end) =>■ wrong 

4.4 Soundness 

We have defined two sets of rules — one for evaluating expressions and one for deriving 
their types. At this point, it is reassuring to observe that the two systems "fit together" 
in the way we would expect. We can show that "evaluation preserves typing" — that if a 
well-typed expression e reduces to a canonical expression v, then v is assigned the same 
type as e by the typing rules. From this it is an easy corollary that no well-typed program 
can evaluate to wrong. 

We begin with a lemma that connects the form of proofs using the typing rules (which 
use type environments) with that of proofs using the evaluation rules (which use substi- 
tution instead of binding environments). Since many of the type environments we are 
concerned with will be empty, we write "h e : T" as an abbreviation for "0 h e : T." 

Lemma 4.4.1 (Substitution preserves typing) For all expressions e, canonical ex- 
pressions v, closed types V and W, type environments TE, and variables z, if h v : V and 
TE[z <- V] h e : W, then TE h e[z <- v] : W. 

Proof: We argue by induction on the length of a derivation of TE[z <— V] h e : W. There 
is one case for each of the typing rules; in each case, we must show how to construct a 
derivation of TE h e[z <— v] : W from the a derivation whose final step is an application 
of the rule in question. We give the proof for three representative cases: 

• e = x 

If x = z, then e[z <— v] = v. By the typing rule for variables, TE[z <— V] h z : V. 
Immediately, TE h e[z <— v] : V. 

If x 7^ z, then e[z <— v] = x and TE h e[z <— V] : W. 



4 OPERATIONAL SEMANTICS 



18 



• e = Ax:T.e body 

If x = z, then e[z <— v] = e. Immediately, TE h e[z <— v] : W. 

If x 7^ z, then for the typing rule for A-expressions to apply (giving TE[z <— V] h 
e : T^U for some T and U), it must be the case that TE[x <— T, z <— V] h e body : U. 
By the induction hypothesis, TE[x <— T] h e body [z <— v] : U. By the typing rule 
for A again, TE h Ax:T. (e body [z <— v]) : T^U. By the definition of substitution, 
h e[z <- v] : T^U. 

For the typing rule for application to apply (giving TE[z <— V] h e : W), it must be 
the case that TE[s <— v] h e fun : T^W and Ti?[z <— V] h e arg : T for some T. By the 
induction hypothesis, TE h e fun [z <— v] : T^W and Ti? h e arg [z <— v] : T. Now by the 
typing rule for application, TE h (e fun [z <— v]) (e arg [z <— v]) : W. By the definition of 
substitution, TE h e[z <— v] : W. 

i^nc? of Proof. 

Now we are ready for the soundness theorem itself. 

Theorem 4.4.2 (Soundness) For all expressions e, canonical expressions v, and types 
W, if he^v and h e : W, then h v : W. 

Proof: By induction on the length of the derivation he^v. There is one case for each 
possible syntactic form of e. We show only a few representative cases: 

• e = Ax:T.e body 
Immediate, since v = e. 

• ® ®fun(®arg) 

The typechecking rule for application must be the last step in the derivation of h e : W, 
so h e arg : T and h e fun : T^W for some T. 

If the last step in the derivation of h e =>■ v is the second evaluation rule for applica- 
tion, then h e fun =>■ u for some u not of the form Ax:T.e body . But among canonical 
expressions, only those of this form are assigned a functional type by the typing rules, 
so our assumption contradicts the induction hypothesis. 

Similarly, if the last step in the derivation of h e =>■ v is the third evaluation rule for 
application, then h e arg =>■ wrong. But wrong is not assigned any type whatsoever 
by the typing rules, again contradicting the induction hypothesis. 

So we may assume that the main evaluation rule for application is the last step 
in the derivation of h e =>■ v, from which it follows that h e fun =>■ Ax:T.e body , 
l~ e ar g wrong), and h e body [x <— w] =>■ v. By the induction hypothesis, h w : T 

and h Ax:T.e body : T^W. Since the last step in the latter derivation must be the 
typing rule for A-expressions, [x <— T] h e body : W. By Lemma 4.4.1, h e body [x <— w] : W. 
Finally, by the induction hypothesis again, h v : W. 
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• e = dynamic e body : T 

If h e body =>■ wrong, then by the induction hypothesis and the typing rule for dynamic, 
h wrong : T. This cannot be the case. 

So assume that h e body =4> w (w 7= wrong), so that the main evaluation rule for 
dynamic is the last step in the derivation of h e =>■ v. The typechecking rule for 
dynamic must be the last step in the derivation of h e : W (here W = Dynamic), so 
h e body : T. By the induction hypothesis, h w : T. By the typing rule for dynamic 
again, h v : W. 

• e = typecase e sel of 

(Xi) (Xi:Ti) 9; 
else e else 

end 

Assume that h e sel =>■ dynamic w:U, that for some k, match(U, T k ) = a while 
match(U, Tj) fails for all j < k, and that e k <r[x k <— w] =>■ v, so that the main evalua- 
tion rule for typecase is the last step in the derivation of h e =>■ v. (The argument 
for the second typecase rule is straighforward; the wrong case proceeds as in the 
previous two arguments.) 

By the typechecking rule for typecase, h e sel : Dynamic. By the induction hypothesis, 
h w : U. By the typechecking rule again, [x k <— T k <r] h e k <7 : W. By the definition of 
match, this can be rewritten as [x k <— U] h e k <7 : W. By Lemma 4.4.1, h e k <r[x k <— w] : 
W. Now by the induction hypothesis, h v : W. 

End of Proof. 

Since wrong is not assigned any type by the typing rules, the following is immediate: 

Corollary 4.4.3 For all expressions e, canonical expressions v, and types T, if he=^v 
and h e : T then v 7^ wrong. 

5 Denotational Semantics 

Another way of showing that our rules are sound is to define a semantics for the language 
and show that no well-typed expression denotes wrong. In general terms, this involves 
constructing a domain V and defining a "meaning function" that assigns a value [e] in 
V to each expression e in each environment p. The domain V should contain an element 
wrong such that [wrong] p = wrong for all p. 
Two properties are highly desirable: 

• If e is a well-typed expression then [e] 7= wrong for well-behaved p. 

• If h e =>■ v then [e] = [[v] (that is, evaluation is sound). 
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To prove the former one, it suffices to map every typecode T to a subset [T] of V not 
containing wrong, and prove: 

• If h e : T then [e] G [T] for all p (that is, typechecking is sound). 

In this section we carry out this program in an untyped model and suggest an approach 
with a typed model. 

5.1 Untyped Semantics 

In this subsection we give meaning to expressions as elements of an untyped universe V 
and to typecodes as subsets of V. It would appear at first that the meaning of Dynamic 
can simply be defined as the set of all pairs (f,T), such that v G [T]. But T here ranges 
over all types, including Dynamic itself, so this definition as it stands is circular. We must 
build up the denotations of type expressions more carefully. 

We therefore turn to the ideal model of types, following MacQueen, Plotkin, and 
Sethi [22]. (We refer the reader to this paper for the technical background of our con- 
struction.) Typecodes denote ideals — nonempty subsets of V closed under approximations 
and limits. We denote by Idl the set of all ideals in V. 

The ideal model has several features worth appreciating. First, to some extent the ideal 
model captures the intuition that types are sets of structurally similar values. Second, the 
ideal model accounts for diverse language constructs, including certain kinds of polymor- 
phism. Finally, a large family of recursive type equations are guaranteed to have unique 
solutions. We exploit this feature to define the meaning of Dynamic with a recursive type 
equation. 

We choose a universe V that satisfies the isomorphism equation 

V ^ N + (V->V) + (Vx TypeCode) + W, 

where N is the flat domain of natural numbers and W is the type error domain {w}j_. The 
usual continuous function space operation is represented as — the product-space Ex A of 
a cpo E and a set A is defined as {(e,a) | e G E, e ^_L, and a G A} U {-L_b} 5 with the 
evident ordering. 

V can be obtained as the limit of a sequence of approximations Vo, Vi, . . . , where 



We omit the details of the construction, which are standard [3, 22]. 

At this point, we have a universe suitable for assigning a meaning to expressions in 
our programming language. Figure 3 gives a full definition of the denotation function [[ ], 
using the following notation: 

• LL d in V," where d belongs to a summand S of V, is the injection of d into V; 




N + (V,-^V,-) + (V,-x TypeCode) + W. 



• wrong is an abbreviation for LL w in V" ; 
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[wrongj P 
[Ax:T.e body ] p 

[ e fun ( e arg)l p 

10], 

[fsucc e„ a . t L 



[ J : fep^ Var-^V)^V 
wrong 

(Aw. ifv = wrong then wrong else [e body ] p{x _ „ } ) in V 
if [e fun J £ (V->V) then wrong else ([e fun J | v ^ v )([[e 



0 in V 

if I e natL ^ N tnen wrong else ([e nat ] 



N 



1) in V 



[test e nat 0:e zero succ (x) : e succ 



if [ e natJ p t- N tnen wrong 

else if [e nat ] p = 0 in V then [e zer J p 

else [e succ ] p{x _ (( [e nat ] p | N - 1) m v)> 



[dynamic e body :T] p 
[typecase e sel of . 



= if [ewyjp = wrong then wrong else (([e body ] p , T) in V) 
(X*)(x 1 :T 1 )e 1 ...else e else J p 
= if [e sel ] p ^ (Vx Type Code) then wrong 
else let (d,U) = [e sel J p | VxT^eCWe in 
if ... 

else if match(U, T ; ) succeeds 

then let a = mafc/i(U,T;) in [eid]^.^^ 
else if . . . 
else [e else ] p 



Figure 3: The Meaning Function for Expressions 



• v |g yields: if v = (d in V) for some d £ S then d, otherwise _L; 

• v E S yields _L if v = _L, true if v = (d in V) for some d £ S, and false otherwise; 

• = yields _L whenever either argument does. 

Note that the definition of = guarantees that [(Ax:T.e body ) (e arg )] p =_L whenever 
KJ„ =-L- 

The denotation function "commutes" with substitutions and evaluation is sound with 
respect to the denotation function: 

Lemma 5.1.1 Let e be an expression, a a substitution, and p and p' two environments. 
Assume that p maps each variable symbol x for which a is defined to [x<r]] p ,, and that it 
coincides with p' elsewhere. Then [ec] ; = [e] . 

Proof: The proof is a tedious inductive argument, and we omit it. End of Proof. 
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Theorem 5.1.2 For all expressions e and v, if h e =>■ v i/ien [e] = [v]. 

Proof: We argue by induction on the derivation of h e =>■ v. There is one case for each 
evaluation rule. We give only a few typical ones. 

• For function applications: Assume that [e fun ] = [Ax:T.e body ], [e arg ] = [w] with 
w ^ wrong, and [e body [x <- w]J = [v], to prove that [e fun (e arg )] = [v]. Note 
that [Ax:T.e body ] p must be a function from V to V for all p, and w cannot de- 
note wrong (since w is canonical). Therefore, we have [e fun (e arg )] p = [e body ] p{x ^ „ } 
where v = [e arg ] p , for all p. Since |[e arg ] = [w], Lemma 5.1.1 yields [e fun (e arg )] p = 
[e body [x <— w]] , and the hypothesis [e body [x <— w]] = [v] immediately leads to the 
desired equation. 

• For construction of dynamic values: Assume that [e body ] = [w] with w ^ wrong, 
to prove that [dynamic e body :T] = [dynamic w:T]. As in the previous case, be- 
cause w cannot denote wrong, we have [dynamic e body :T] p = ([e body ] p , T) and 
[dynamic w:T] p = ([w] ,T). The desired equation follows at once from [e body ] = [w]. 

• For typecase operations: Assume that [e sel ] = [dynamic w:T], match(T, Tj) fails for 
all j < k, match(T, T k ) = a, and [e k <7 [x k <— w]] = [v], to prove that [typecase e sel 
of ...(X ; ) (xjrTi) e ; ...else e else end] = [v]. As usual, w cannot denote wrong, 
and hence we obtain the following chain of equalities, for arbitrary p: [typecase e sel 
of ...(X*) (XiiT;) e;...else e else end] p equals le k a} p ^ x ^ d y, where d is [w] p (by 
the hypotheses and the definition of [ ]), equals [e k <r[x k <— w]] (by Lemma 5.1.1), 
equals [v] (by the hypotheses). The case where the else branch of a typecase is 
chosen is similar but simpler. 

End of Proof. 

Although we now have a meaning [e] for each program e, we do not yet have a meaning 
[T] for each typecode T. Therefore, in particular, we cannot prove yet that typechecking 
is sound. The main difficulty, of course, is to decide on the meaning of Dynamic. 

We define the type of dynamic values with a recursive equation. Some auxiliary oper- 
ations are needed to write this equation. 

Definition 5.1.3 If I C V is a set of values and T is a typecode, then 
I T = {c\ (c,T)G/}. 

(Often, and in these definitions in particular, we omit certain injections from summands 
into V and the corresponding projections from V to its summands, which can be recovered 
from context.) 

Definition 5.1.4 If I C V and J C V are two sets of values, then 
I J= {( C ,T^U) | c(J T ) C J v , where T,V G TypeCode}. 

Note that if / and J are ideals then so is I —» J. 

Using these definitions, we can write an equation for the type of dynamic values: 
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D = N X {Mat} 
U D — D 
U D X {Dynamic} 

Here the variable D ranges over Idl, the set of all ideals in V. 

The equation follows from our informal definition of the type of dynamic values as the 
set of pairs (v,T) where [v] £ [T]. Intuitively, the equation states that a dynamic value 
can be one of three things. First, a dynamic value with tag Nat must contain a natural 
number. Second, if (c,T— >U) is a dynamic value then c(v) £ [U] for all v £ JT], and hence 
(c(f),U) is a dynamic value whenever (f,T) is. Third, a dynamic value with tag Dynamic 
must contain a dynamic value. 

How is one to guarantee that this equation actually defines the meaning of Dynamic? 
MacQueen, Plotkin, and Sethi invoke the Banach Fixed Point Theorem to show that 
equations of the form D = -F(D) over Idl have unique solutions, provided F is contractive 
in the following sense. 

Informally, the rank r(a) of an element a of V is the least i such that a "appears" in V; 
during the construction of V as a limit. A witness for two ideals /and J is an element that 
belongs to one but not to the other; their distance d(I, J) is 2 _r , where r is the minimum 
rank of a witness for the ideals. The function G is contractive if there exists a real number 
t < 1 such that for all X\, . . . , X n , X lr . . , X n , we have 

d(G(X 1 ,...,X n ),G(X[,...,X , n )) < t-max{d{X l ,X i )\ 1 < i < n}. 

Typically, one guarantees that an operation is contractive by expressing it in terms of 
basic operations such as X and — and then inspecting the structure of this expression. In 
our case, we have a new basic operation, — »■; in addition, X is slightly nonstandard. We 
need to prove that these two operations are contractive. 

Theorem 5.1.5 The operation X is contractive (when its second argument is fixed). The 
operation — is contractive. 

Proof: The arguments are based on the corresponding ones for Theorem 7 of [22]. In 
fact, the proof for X is a trivial variant of the corresponding one. We give only the proof 
for — ». 

Let c be a witness of minimum rank for i— »Jand F—»J\ being, say, only in the former 
ideal. Then c (otherwise it would not be a witness), so c = (/, T— >U) for some /, 
T, and U. By the analogue of Proposition 4 of [22], / = |_|(a; =^ h) f° r some £ V, 
with r( f) > max(r(a ; ), r(6 ; )) (here a ; =>■ 6 ; denotes the step function which returns 6 ; 
for arguments larger than a ; and _L otherwise). Since c ^ F—^J\ f is not in F T —^J , V . 
Hence there must be an x £ F T such that f(x) (j£ J' v . Let a = |J{ a i I ^ i} and 
b = |_|{&; | a ; C x} = f(x). Then a £ F T (since a C x) but b (j£ J' v . Moreover, by the 
analogue of Proposition 4 of [22], r(a) < maxfria^) \ a ; C x} < r(f) and r(b) < r(f). 
Similarly, r(a) + 1 < r(c) and r(b) + 1 < r(c). 

There are two cases. If a (j£ I T then (a,T) is a witness for / and /' of rank less than 
r(c). (For all v, r((f,T)) < r(v) + 1.) Otherwise, a £ I T and so b = f(a) £ J v since 
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/ £ I T —^Jij. Thus (&,U) is a witness for J and J ' of rank less than r(c). In either case, we 
have c(I—»J, P—^J 1 ) = r(c) > min(c(I, P), c(J, </')). i?n<i of Proof. 

Immediately, the general result about the existence of fixed points yields the desired 
theorem. 

Theorem 5.1.6 The equation 

D = N X {Mat} 
U D — D 
U D X {Dynamic} 

has a unique solution in Idl. 

Let us call this solution Dynamic. 



[ ] : TypeCode- 


^Idl 


[NatJ 


N 


[Dynamic] = 


Dynamic 


[T-U] 


{c | c([T]) C [U]} 



Figure 4: The Meaning Function for Typecodes 



Finally, we are in a position to associate an ideal JT] with each typecode T (see figure 4). 
The semantics fits our original intuition of what dynamic values are, as the following lemma 
shows. 

Lemma 5.1.7 For all values v and typecodes T, (f,T) £ Dynamic if and only if v £ [T]. 

Proof: The proof is by induction on the structure of T. 

For T = Nat, we need to check that (v ,Nat) £ Dynamic if and only if v £ [Nat]. This 
follows immediately from the equation, since all and only natural numbers are tagged with 
Nat. 

Similarly, for T = Dynamic, we need to check that (v, Dynamic) £ Dynamic if and only 
if v £ [Dynamic]. This follows immediately from the equation, since all and only dynamic 
values are tagged with Dynamic. 

Finally, for T = U^V, we need to check that (v,U— >V) £ Dynamic if and only if 
v £ [U^V]. By induction hypothesis, we have Dynamicy = [U] and Dynamic^ = 
[V]. We derive the following chain of equivalences: (v,U— >V) £ Dynamic if and only if 
v (Dynamicy) C Dynamic^ (according to the equation), if and only if v ([U]) C [V] (by 
induction hypothesis), if and only if v £ [U— >V] (according to the definition of [ ]). End 
of Proof. 

We can also prove the soundness of typechecking: 
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Definition 5.1.8 The environment p is consistent with the type environment TE on the 
expression e if TE (x) is defined and p(x) £ \TE (x)] for all x £ FV(e). 

Theorem 5.1.9 For all type environments TE, expressions e, environments p consistent 
with TE on e, and typecodes T, if TE he :T then [e] p £ [T]. 

Proof: We argue by induction on the derivation of TE h e : T. There is one case for 
each typing rule. We give only a few typical ones. 

• For abstractions: Assume that [e] p £ T for all TE and all p consistent with 
TE[x <— U] on e, to prove that [Ax:U.e] p £ (U— >-T) for all TE and all p consis- 
tent with TE on Ax:U.e. Consider some v £ [U]. According to the definition 
of [ ], we need to show that ([Ax :U. e] p )f £ [T]. We may assume that v j^A. 
(the _L case is trivial), and v wrong ([[U] cannot contain wrong). Thus, we have 
[Ax:U.e] p f = [e] p{x ^ „ } . The hypothesis immediately yields that this value is a 
member of [T]. 

• For function applications: Assume that [e fun ] p £ [U— >-T] and that [e arg ] p £ [U], to 
prove that [e fun (e arg )] p £ [T]. By the definition of function types, [e fun ] p must be a 
function from V to V, and [e arg ] p cannot be wrong, since [U] cannot contain wrong. 
In addition, we may assume that [e arg ] p j^A. (the _L case is trivial). Immediately, 
[e fU n(e arg )]] p = ( |[e fun ] p ) [[e arg ] p , and the definition of [U->T] yields that this value 
must be a member of [T]. 

• For construction of dynamic values: Assume that [e body ] p £ [T], to prove that 
[dynamic e body :T] p £ [Dynamic]. Since [T] cannot contain wrong, [e body ] p ^ wrong, 
and hence [dynamic e body :T] p = ([e body ] p , T). The desired result then follows from 
Lemma 5.1.7. 

• For typecase operations: Assume that [e sel ] £ Dynamic for all TE and all p 
consistent with TE on e sel ; [eic] £ [T] for all i, for all a £ Subst^; and for all 
TE and all p consistent with TE[xi <— T ; (t] on e ; (T, and [e else ] £ [T] for all TE and 
all p consistent with TE on e else . We prove that [typecase e sel of ... (X ; ) (xjrTi) 
e ; . . .else e else end] p £ [T] for all TE and all p consistent with TE on (typecase 
e sel of . . . (X ; ) (x i :T i )e i . . .else e else end). Similarly to the other cases, [e sel ] must 
be the pair of a value and a typecode, and we may assume that it is not _L. Hence, 
[typecase e sel of . . . (X ; ) (xjrTi) e ; ...else e else end] p is either [e ; (T] p _j- x for 
some i and with d equal to the first component of the selector, or simply [e else ] . In 
the former case, Lemma 5.1.7 guarantees that d £ [T ; (t], and hence the hypotheses 
guarantee that [eicr]^ £ [T]. In the latter case, the hypotheses guarantee that 

[e else ] p £ [T]. In either case, we derive [typecase e sel of . . . (X ; ) (xjrTi) e ; ...else 
e else end] p £ [T]. 

End of Proof. 

It follows from Theorem 5.1.2, Theorem 5.1.9, and the fact that no [T] can contain 
(w in V) that no well-typed expression evaluates to wrong. This gives us a new proof of 
Corollary 4.4.3. 
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5.2 Typed Semantics 

The semantics [[ ] is, essentially, a semantics for the untyped lamb da- calculus, as in its 
definition type information is ignored. This seems very appropriate for languages with 
implicit typing, where some or all of the type information is omitted in programs. But for 
an explicitly typed language it seems natural to look for a semantics that assigns elements 
of domains Vj to expressions of type T. One idea to find these domains is to solve the 
infinite set of simultaneous equations 



A similar use of sums appears in Mycroft's work [28]. 

6 Extensions 

In this section we present some preliminary thoughts on extending the ideas in the rest of 
the paper to languages with implicit or explicit polymorphism, abstract data types, and 
more expressive type patterns. 

6.1 Polymorphism 

For most of the section, we assume an explicitly typed polymorphic lambda calculus along 
the lines of Reynolds' system [32]. The type abstraction operator is written as A. Type ap- 
plication is written with square brackets. The types of polymorphic functions begin with V. 
For example, VT.T^T is the type of the polymorphic identity function, AT. Ax:T. x. 

In the simplest case, the typechecking and operational semantics of dynamic and 
typecase carry over nearly unchanged from the language described in Section 4. We 
simply redefine match as follows: 

If there is a substitution a such that T and U<7 are identical up to renaming 
of bound type variables, then match(T, U) returns some such substitution. 
Otherwise, match(T, U) fails. 

We can now write typecase expressions that match polymorphic type tags. For ex- 
ample, the following function checks that f is a polymorphic function taking elements of 
any type into Nat. It then instantiates f at W, the type tag of its second argument, and 
applies the result to the value part of the second argument. 
Adf: Dynamic. Ade: Dynamic, 
typecase df of 




V T ^Vu 



N 




T 



(f: VZ. Z Mat) 
typecase de of 

(W) (e: W) f [W] (e) 



else 0 
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end 
else 0 
end 

6.2 Abstract Data Types 

In a similar vein, we can imagine extending the language of type tags to include existentially 
quantified variables. Following Mitchell and Plotkin [27], we can think of a Dynamic whose 
tag is an existential type as being a module with hidden implementation, or alternatively 
as an encapsulated element of an abstract data type. Our notation for existential types 
and labeled products follows that of Cardelli and Wegner [11]. For example, 
As: 3Rep. {push: Rep — ► Nat — ► Rep, 

pop: Rep — ► (Nat X Rep), 

top : Rep — ► Nat , 

empty: Rep}, 
open s as stk[Rep] 

in stk.top(stk.push stk. empty 5) 
is a function that takes a stack package (a tuple containing a hidden representation type 
Rep, three functions, and a constant value), opens the package (making its components 
accessible in the body of the open expression), and performs the trivial computation of 
pushing the number 5 onto an empty stack and returning the top element of the resulting 
stack. 

The following function takes a Dynamic containing a stack package (with hidden rep- 
resentation) and another Dynamic of the same type as the elements of the stack. It pushes 
its second argument onto the empty stack from the stack package, and returns the top of 
the resulting stack, appropriately repackaged as a dynamic value. 
Ads: Dynamic. Ade: Dynamic, 
typecase ds of 
(X) (s: 3Rep. 

{push: Rep — ► X — ► Rep, 
pop: Rep — ► (X X Rep), 
top: Rep — ► X, 
empty: Rep}) 
typecase de of 

(e: X) open s as stk [Rep] 

in dynamic stk. top(stk. push stk. empty e) : X 

else e 
end 

else e 
end 

In order to preserve the integrity of existentially quantified values in a language that 
also has Dynamic, it seems necessary to place some restrictions on the types that may 
appear in dynamic expressions to prevent their being used to expose the witness type 
of an existentially quantified value beyond the scope of an open (or abstype) block. In 
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particular, the type tag in a dynamic constructor must not be allowed to mention the 
representation types of any currently open abstract data types, as in the following: 
Ads: Dynamic. Ade: Dynamic, 
typecase ds of 
(X) (s: 3Rep. 

{push: Rep — ► X — ► Rep, 
pop: Rep — ► (X X Rep), 
top: Rep — ► X, 
empty: Rep}) 
open s as stk[Rep] 

in (* Wrong: *) dynamic stk. empty : Rep 

else de 
end 

It would be wrong here to create a Dynamic whose type tag is the representation 
type of the stack (assuming such type is available at run-time), because this would violate 
the abstraction. It is also unclear how to generate a type tag that does not violate the 
abstraction. Hence we choose to forbid this situation. 

6.3 Restrictions 

In a language with both explicit polymorphism and Dynamic, it is possible to write pro- 
grams where types must actually be passed to functions at run time: 

AX. Ax:X. dynamic x:X 
The extra cost of actually performing type abstractions and applications at run time (rather 
than just checking them during compilation and then discarding them) should not be 
prohibitive. Still, we might also want to consider how the dynamic construct might be 
restricted so that types need not be passed around during execution. A suitable restriction 
is that an expression dynamic e:T is well- formed only if T is closed. 

This restriction was proposed by Mycroft [28] in the context of an extension of ML, 
which uses implicit rather than explicit polymorphism. The appropriate analogue of 
"closed type expressions" in ML is "type expressions with only generic type variables" — 
expressions whose type variables are either instantiated to some known type or else totally 
undetermined (that is, not dependent on any type variable whose value is unknown at 
compile time). 

In fact, in languages with implicit polymorphism, Mycroft 's restriction on dynamic 
is required: there is no natural way to determine where the type applications should be 
performed at run time. Dynamics with non-generic variables can be used to break the ML 
type system. (The problem is analogous to that of "updateable refs" [37].) 

6.4 Higher-order Pattern Variables 

By enriching the language of type patterns, it is possible to express a much broader range 
of computations on Dynamics, including some interesting ones involving polymorphic func- 
tions. Our motivating example here is a generalization of the dynamic application function 
from Section 3. The problem there is to take two dynamic values, make sure that the first 
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is a function and the second an argument belonging to the function's domain, and apply 
the function. Here we want to allow the first argument to be a polymorphic function and 
narrow it to an appropriate monomorphic instance automatically, before applying it to the 
supplied parameter. We call this "polymorphic dynamic application." 

To express this example, we need to extend the typecase construct with "functional" 
pattern variables. Whereas ordinary pattern variables range over type expressions, func- 
tional pattern variables (named F, G, etc., to distinguish them from ordinary pattern 
variables) range over functions from type expressions to type expressions. 

Using functional pattern variables, polymorphic dynamic application can be expressed 
as follows: 

Adf: Dynamic. Ade: Dynamic, 
typecase df of 

(F,G) (f : VZ. (F Z) (G Z)) 
typecase de of 
(W) (e: (F W)) 

dynamic f [W] (e) : (G W) 
else 

dynamic "Error" : String 

end 
else 

dynamic "Error" : String 

end 

For instance, when we apply the function to the arguments 
df = dynamic (AZ.Ax:Z.x): (VZ. Z—^Z) 
de = dynamic 3: Nat 

the first branch of the outer typecase succeeds, binding F and G to the identity function 
on type expressions. The first branch of the inner typecase succeeds, binding W to Nat so 
that (F W) = Nat and (G W) = Nat. Now f (F W) reduces to Ax: (F W) .x and f (G W) (e) 
reduces to 3, which has type (G W) = Nat as claimed. 

Another intriguing example is polymorphic dynamic composition: 
Adf: Dynamic. Adg: Dynamic, 
typecase df of 

(F,G) (f : VW. (F W)->(G W)) 
typecase dg of 

(H) (g: VV. (G V)^(H V)) 
dynamic (AW. g[W] o f [W] ) 

: VV. (F V)->(G V) 

else . . . 
end 
else . . . 
end 

This function checks that its two arguments are both polymorphic functions and that 
their composition is well-typed, returning the composition if so. 
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6.5 Open Issues 

This preliminary treatment of polymorphism and higher-order pattern variables leaves a 
number of questions unanswered: What is the appropriate specification for the match 
operation? How difficult is it to compute? Is there a sensible notion of "most general 
substitution" when pattern variables can range over things like functions from type ex- 
pressions to type expressions? Should pattern variables range over all functions from type 
expressions to type expressions, or only over some more restricted class of functions? What 
are the implications (for both operational and denotational semantics) of implicit vs. ex- 
plicit polymorphism? We hope that our examples may stimulate the creativity of others 
in helping to answer these questions. 

7 Implementation Issues 

This section discusses some of the issues that arise in implementations of languages with 
dynamic values and a typecase construct: methods for efficient transfer of dynamic values 
to and from persistent storage, implementation of the match function, and representation 
of type tags for efficient matching. 

7.1 Persistent Storage 

One of the most important purposes of dynamic values is as a safe and uniform format 
for persistent data. This facility may be heavily exploited in large software environments, 
so it is important that it be implemented efficiently. Large data structures, possibly with 
circularities and shared substructures, need to be represented externally so that they can 
be quickly rebuilt in the heap of a running program. (The type tags present no special 
difficulties: they are ordinary run-time data structures.) 

Fortunately, a large amount of energy has already been devoted to this problem, par- 
ticularly in the Lisp community. Many Lisp systems support "fasl" files, which can be 
used to store arbitrary heap structures. (See [24] for a description of a typical fasl format. 
The idea goes back to 1974, at least.) 

A mechanism for "pickling" heap structures in Cedar/Mesa was designed and imple- 
mented by Rovner and Maxwell, probably in 1982 or 1983. A variant of their algorithm, 
due to Lampson, is heavily used in the Modula-2+ programming environment at the DEC 
Systems Research Center. Another scheme was implemented as part of Tartan Labs' In- 
terface Description Language [29]. This scheme was based on earlier work by Newcomer 
and Dill on the "Production Quality Compiler- Compiler" project at CMU. 

7.2 Type Matching 

Although the particular language constructs described in this paper have not been im- 
plemented, various schemes for dynamic typing in statically typed languages have existed 
for some time (see Section 2). Figure 5 gives a rough classification of several languages 
according to the amount of work involved in comparing types and the presence or absence 
of subtyping. 
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Without subtyping 


With subtyping 


Name equivalence 


Modula-2+, CLU, etc. 


Simula- 6 7 


Rigid Structural Equivalence 




Modula-3, Cedar 


Structural Equivalence 


Amber 


Pattern variables 


Our language 


? 



Figure 5: Taxonomy of languages with dynamic values 



Type matching is simplest in languages like CLU [20] and Modula-2+ [33], where the 
construct corresponding to our typecase allows only exact matches (no pattern variables), 
and where equivalence of types is "by name." In Modula-2+, for example, the type tags 
of dynamic values are just unique identifiers and type matching is a check for equality. 

When subtyping is involved, matching becomes more complicated. For example, 
Simula-67 uses name equivalence for type matching so type tags can again be represented 
as atoms. But to find out whether a given object's type tag matches an arm of a when 
clause (which dynamically checks whether an object's actual type is in a given subclass of 
its statically apparent type), it is necessary to scan the superclasses of the object's actual 
class. This is reasonably efficient, since the subclass hierarchy tends to be shallow and 
only a few instructions are required to check each level. 

It is also possible to have a language with structural equivalence where type matching 
is still based on simple comparison of atoms. Modula-3, for example, includes a type 
similar to Dynamic, a typecase construct that allows only matching of complete type 
expressions (no pattern variables), and a notion of subtyping [8, 9]. (We do not know of 
a language with structural equivalence, Dynamic, and exact type matching, but without 
subtyping.) Efficient implementation of typecase is possible in Modula-3 because the 
rules for structural matching of subtypes are "rigid" — subtyping is based on an explicit 
hierarchy. Thus, a unique identifier can still be associated with each equivalence class of 
types, and, as in Simula-67, match can check that a given tag is a subtype of a typecase 
guard by quickly scanning a precompiled list of superclasses of the tag. 

Amber's notion of "structural subtyping" [7] requires a more sophisticated represen- 
tation of type tags. The subtype hierarchy is not based on explicit declarations, but on 
structural similarities that allow one type to be safely used wherever another is expected. 
(For example, a record type with two fields a and b is a subtype of another with just the 
field a, as long as the type of a in the first is a subtype of the type of a in the second.) 
This means that the set of supertypes of a given type cannot be precomputed by the com- 
piler. Instead, Dynamic values must be tagged with the entire structural representation of 
their types — the same representation that the compiler uses internally for typechecking. 
(In fact, because the Amber compiler is bootstrapped, the representations are exactly the 
same.) The match function must compare the structure of the type tag with that of each 
type pattern. 

The language described in this paper also requires a structural representation of types — 
not because of subtyping, but because of the pattern variables in typecase guards. In order 
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to determine whether there is a substitution of type expressions for pattern variables that 
makes a given pattern equal to a given type tag, it is necessary to actually match the two 
structurally, filling in bindings for pattern variables from the corresponding subterms in 
the type tag. This is exactly the "first-order matching" problem. We can imagine speeding 
up this structural matching of type expressions by precompiling code to match an unknown 
expression against a given known expression, using techniques familiar from compilers for 
ML [21]. 

The last box in figure 5 represents an open question: Is there a sensible way to combine 
some notion of subtyping with a typecase construct that includes pattern variables? The 
problems here are quite similar to those that arise in combining subtyping with polymor- 
phism (for example, the difficulties in finding principal types). 

8 Conclusions 

Dynamic typing is necessary for embedding a statically typed language into a dynamically 
typed environment, while preserving strong typing. We have explored the syntax, oper- 
ational semantics, and denotational semantics of a typed lamb da- calculus with the type 
Dynamic. We hope that after a long but rather obscure existence, Dynamic may become a 
standard programming language feature. 
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